267 research outputs found

    A Modeling of Singing Voice Robust to Accompaniment Sounds and Its Application to Singer Identification and Vocal-Timbre-Similarity-Based Music Information Retrieval

    Get PDF
    This paper describes a method of modeling the characteristics of a singing voice from polyphonic musical audio signals including sounds of various musical instruments. Because singing voices play an important role in musical pieces with vocals, such representation is useful for music information retrieval systems. The main problem in modeling the characteristics of a singing voice is the negative influences caused by accompaniment sounds. To solve this problem, we developed two methods, accompaniment sound reduction and reliable frame selection . The former makes it possible to calculate feature vectors that represent a spectral envelope of a singing voice after reducing accompaniment sounds. It first extracts the harmonic components of the predominant melody from sound mixtures and then resynthesizes the melody by using a sinusoidal model driven by these components. The latter method then estimates the reliability of frame of the obtained melody (i.e., the influence of accompaniment sound) by using two Gaussian mixture models (GMMs) for vocal and nonvocal frames to select the reliable vocal portions of musical pieces. Finally, each song is represented by its GMM consisting of the reliable frames. This new representation of the singing voice is demonstrated to improve the performance of an automatic singer identification system and to achieve an MIR system based on vocal timbre similarity

    Nonparametric Bayesian Dereverberation of Power Spectrograms Based on Infinite-Order Autoregressive Processes

    Get PDF
    This paper describes a monaural audio dereverberation method that operates in the power spectrogram domain. The method is robust to different kinds of source signals such as speech or music. Moreover, it requires little manual intervention, including the complexity of room acoustics. The method is based on a non-conjugate Bayesian model of the power spectrogram. It extends the idea of multi-channel linear prediction to the power spectrogram domain, and formulates a model of reverberation as a non-negative, infinite-order autoregressive process. To this end, the power spectrogram is interpreted as a histogram count data, which allows a nonparametric Bayesian model to be used as the prior for the autoregressive process, allowing the effective number of active components to grow, without bound, with the complexity of data. In order to determine the marginal posterior distribution, a convergent algorithm, inspired by the variational Bayes method, is formulated. It employs the minorization-maximization technique to arrive at an iterative, convergent algorithm that approximates the marginal posterior distribution. Both objective and subjective evaluations show advantage over other methods based on the power spectrum. We also apply the method to a music information retrieval task and demonstrate its effectiveness

    Classification of Known and Unknown Environmental Sounds Based on Self-Organized Space Using a Recurrent Neural Network

    Get PDF
    Our goal is to develop a system to learn and classify environmental sounds for robots working in the real world. In the real world, two main restrictions pertain in learning. (i) Robots have to learn using only a small amount of data in a limited time because of hardware restrictions. (ii) The system has to adapt to unknown data since it is virtually impossible to collect samples of all environmental sounds. We used a neuro-dynamical model to build a prediction and classification system. This neuro-dynamical model can self-organize sound classes into parameters by learning samples. The sound classification space, constructed by these parameters, is structured for the sound generation dynamics and obtains clusters not only for known classes, but also unknown classes. The proposed system searches on the basis of the sound classification space for classifying. In the experiment, we evaluated the accuracy of classification for both known and unknown sound classes

    Complex and Transitive Synchronization in a Frustrated System of Calling Frogs

    Get PDF
    This letter reports synchronization phenomena and mathematical modeling on a frustrated system of living beings, or Japanese tree frogs (Hyla japonica). While an isolated male Japanese tree frog calls nearly periodically, he can hear sounds including calls of other males. Therefore, the spontaneous calling behavior of interacting males can be understood as a system of coupled oscillators. We construct a simple but biologically reasonable model based on the experimental results of two frogs, extend the model to a system of three frogs, and theoretically predict the occurrence of rich synchronization phenomena, such as triphase synchronization and 1:2 antiphase synchronization. In addition, we experimentally verify the theoretical prediction by ethological experiments on the calling behavior of three frogs and time series analysis on recorded sound data. Note that the calling behavior of three male Japanese tree frogs is frustrated because almost perfect antiphase synchronization is robustly observed in a system of two male frogs. Thus, nonlinear dynamics of the three-frogs system should be far from trivial

    Developmental Human-Robot Imitation Learning of Drawing with a Neuro Dynamical System

    Full text link
    Abstract—This paper mainly deals with influences of teach-ing style and developmental processes in learning model to the acquired representations (primitives). We investigate these in-fluences by introducing a hierarchical recurrent neural network for robot model, and a form of motionese (a caregiver’s use of simpler and more exaggerated motions when showing a task to an infants). We modified a Multiple Timescales Recurrent Neural Network (MTRNN) for robot’s self-model. The number of layers in the MTRNN increases according to learn complex events. We investigate our approach with a humanoid robot “Actroid ” through conducting an imitation experiment in which a human caregiver gives the robot a task of pushing two buttons. Experiment results and analysis confirm that learning with phased teaching and structuring enables to acquire the clear motion primitives as the activities in the fast context layer of MTRNN and to the robot to handle unknown motions. I

    A robot uses its own microphone to synchronize its steps to musical beats while scatting and singing

    Full text link
    Abstract—Musical beat tracking is one of the effective technologies for human-robot interaction such as musical ses-sions. Since such interaction should be performed in various environments in a natural way, musical beat tracking for a robot should cope with noise sources such as environmental noise, its own motor noises, and self voices, by using its own microphone. This paper addresses a musical beat tracking robot which can step, scat and sing according to musical beats by using its own microphone. To realize such a robot, we propose a robust beat tracking method by introducing two key techniques, that is, spectro-temporal pattern matching and echo cancellation. The former realizes robust tempo estimation with a shorter window length, thus, it can quickly adapt to tempo changes. The latter is effective to cancel self noises such as stepping, scatting, and singing. We implemented the proposed beat tracking method for Honda ASIMO. Experimental results showed ten times faster adaptation to tempo changes and high robustness in beat tracking for stepping, scatting and singing noises. We also demonstrated the robot times its steps while scatting or singing to musical beats. I

    Visualizing Phonotactic Behavior of Female Frogs in Darkness

    Get PDF
    Many animals use sounds produced by conspecifics for mate identification. Female insects and anuran amphibians, for instance, use acoustic cues to localize, orient toward and approach conspecific males prior to mating. Here we present a novel technique that utilizes multiple, distributed sound-indication devices and a miniature LED backpack to visualize and record the nocturnal phonotactic approach of females of the Australian orange-eyed tree frog (Litoria chloris) both in a laboratory arena and in the animal’s natural habitat. Continuous high-definition digital recording of the LED coordinates provides automatic tracking of the female’s position, and the illumination patterns of the sound-indication devices allow us to discriminate multiple sound sources including loudspeakers broadcasting calls as well as calls emitted by individual male frogs. This innovative methodology is widely applicable for the study of phonotaxis and spatial structures of acoustically communicating nocturnal animals
    corecore